My First Deep Learning System of 1991 + Deep Learning Timeline 1962-2013

نویسنده

  • Jürgen Schmidhuber
چکیده

Deep Learning has attracted significant attention in recent years. Here I present a brief overview of my first Deep Learner of 1991, and its historic context, with a timeline of Deep Learning highlights. Note: As a machine learning researcher I am obsessed with proper credit assignment. This draft is the result of an experiment in rapid massive open online peer review. Since 20 September 2013, subsequent revisions published under www.deeplearning.me have absorbed many suggestions for improvements by experts. The abbreviation “TL” is used to refer to subsections of the timeline section. Figure 1: My first Deep Learning system of 1991 used a deep stack of recurrent neural networks (a Neural Hierarchical Temporal Memory) pre-trained in unsupervised fashion to accelerate subsequent supervised learning [79, 81, 82]. 1 ar X iv :1 31 2. 55 48 v1 [ cs .N E ] 1 9 D ec 2 01 3 In 2009, our Deep Learning Artificial Neural Networks became the first Deep Learners to win official international pattern recognition competitions [40, 83] (with secret test sets known only to the organisers); by 2012 they had won eight of them (TL 1.13), including the first contests on object detection in large images (ICPR 2012) [2, 16] and image segmentation (ISBI 2012) [3, 15]. In 2011, they achieved the world’s first superhuman visual pattern recognition results [20, 19]. Others have implemented very similar techniques, e.g., [51], and won additional contests or set benchmark records since 2012, e.g., (TL 1.13, TL 1.14). The field of Deep Learning research is far older though—compare the timeline (TL) further down. My first Deep Learner dates back to 1991 [79, 81, 82] (TL 1.7). It can perform credit assignment across hundreds of nonlinear operators or neural layers, by using unsupervised pre-training for a stack of recurrent neural networks (RNN) (deep by nature) as in Figure 1. Such RNN are general computers more powerful than normal feedforward NN, and can encode entire sequences of input vectors. The basic idea is still relevant today. Each RNN is trained for a while in unsupervised fashion to predict its next input. From then on, only unexpected inputs (errors) convey new information and get fed to the next higher RNN which thus ticks on a slower, self-organising time scale. It can easily be shown that no information gets lost. It just gets compressed (much of machine learning is essentially about compression). We get less and less redundant input sequence encodings in deeper and deeper levels of this hierarchical temporal memory, which compresses data in both space (like feedforward NN) and time. There also is a continuous variant [84]. One ancient illustrative Deep Learning experiment of 1993 [82] required credit assignment across 1200 time steps, or through 1200 subsequent nonlinear virtual layers. The top level code of the initially unsupervised RNN stack, however, got so compact that (previously infeasible) sequence classification through additional supervised learning became possible. There is a way of compressing higher levels down into lower levels, thus partially collapsing the hierarchical temporal memory. The trick is to retrain lower-level RNN to continually imitate (predict) the hidden units of already trained, slower, higher-level RNN, through additional predictive output neurons [81, 79, 82]. This helps the lower RNN to develop appropriate, rarely changing memories that may bridge very long time lags. The Deep Learner of 1991 was a first way of overcoming the Fundamental Deep Learning Problem identified and analysed in 1991 by my very first student (now professor) Sepp Hochreiter (TL 1.6): the problem of vanishing or exploding gradients [46, 47, 10]. The latter motivated all our subsequent Deep Learning research of the 1990s and 2000s. Through supervised LSTM RNN (1997) (e.g., [48, 32, 39, 36, 37, 40, 38, 83], TL 1.8) we could eventually perform similar feats as with the 1991 system [81, 82], overcoming the Fundamental Deep Learning Problem without any unsupervised pre-training. Moreover, LSTM could also learn tasks unlearnable by the partially unsupervised 1991 chunker [81, 82]. Particularly successful are stacks of LSTM RNN [40] trained by Connectionist Temporal Classification (CTC) [36]. On faster computers of 2009, this became the first RNN system ever to win an official international pattern recognition competition [40, 83], through the work of my PhD student and postdoc Alex Graves, e.g., [40]. To my knowledge, this also was the first Deep Learning system ever (recurrent or not) to win such a contest (TL 1.10). (In fact, it won three different ICDAR 2009 contests on connected handwriting in three different languages, e.g., [83, 40], TL 1.10.) A while ago, Alex moved on to Geoffrey Hinton’s lab (Univ. Toronto), where a stack [40] of our bidirectional LSTM RNN [39] also broke a famous TIMIT speech recognition record [38] (TL 1.14), despite thousands of man years previously spent on HMM-based speech recognition research. CTC-LSTM also helped to score first at NIST’s OpenHaRT2013 evaluation [11]. Recently, well-known entrepreneurs also got interested [43, 52] in such hierarchical temporal memories [81, 82] (TL 1.7). The expression Deep Learning actually got coined relatively late, around 2006, in the context of unsupervised pre-training for less general feedforward networks [44] (TL 1.9). Such a system reached 1.2% error rate [44] on the MNIST handwritten digits [54, 55], perhaps the most famous benchmark of Machine Learning. Our team first showed that good old backpropagation (TL 1.2) on GPUs (with training pattern distortions [6, 86] but without any unsupervised pre-training) can actually achieve a three times better result of 0.35% [17] back then, a world record (a previous standard net achieved 0.7% [86]; a backprop-trained [54, 55] Convolutional NN (CNN or convnet) [29, 30, 54, 55] got 0.39% [70](TL 1.9);

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detection of children's activities in smart home based on deep learning approach

 Monitoring behavior of children in the home is the extremely important to avoid the possible injuries. Therefore, an automated monitoring system for monitoring behavior of children by researchers has been considered. The first step for designing and executing an automated monitoring system on children's behavior in closed spaces is possible with recognize their activity by the sensors in the e...

متن کامل

Detection of children's activities in smart home based on deep learning approach

 Monitoring behavior of children in the home is the extremely important to avoid the possible injuries. Therefore, an automated monitoring system for monitoring behavior of children by researchers has been considered. The first step for designing and executing an automated monitoring system on children's behavior in closed spaces is possible with recognize their activity by the sensors in the e...

متن کامل

تأثیر آموزش راهبردهای خود تنظیمی بر رویکردهای یادگیری دانش آموزان اول دبیرستان

Abstract The present study was conducted to determine the effect of learning self-regulation strategies on surface, deep and strategic learning approaches of high school first grade female students in Yazd. The study method was pre-test and post-test design. For this purpose, a sample size of 57 subjects was selected by multistage cluster sampling method among high school first grade female ...

متن کامل

A Hybrid Optimization Algorithm for Learning Deep Models

Deep learning is one of the subsets of machine learning that is widely used in Artificial Intelligence (AI) field such as natural language processing and machine vision. The learning algorithms require optimization in multiple aspects. Generally, model-based inferences need to solve an optimized problem. In deep learning, the most important problem that can be solved by optimization is neural n...

متن کامل

A Grouping Hotel Recommender System Based on Deep Learning and Sentiment Analysis

Recommender systems are important tools for users to identify their preferred items and for businesses to improve their products and services. In recent years, the use of online services for selection and reservation of hotels have witnessed a booming growth. Customer’ reviews have replaced the word of mouth marketing, but searching hotels based on user priorities is more time-consuming. This s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1312.5548  شماره 

صفحات  -

تاریخ انتشار 2013